Introduction to Reservoir Computing with ReservoirPy¶

Nathan Trouvain
Inria, IMN, LaBRI - Bordeaux, France




UCLA - November 14th 2023

Table of content¶

  • Key conceptss
  • Timeseries prediction
  • Learned attractors and generative capabilities
  • Online learning
  • Understanding hyperparameters
  • Use Case demo - falling robot
  • Use Case demo - canary songs transcriber

Key oncepts and features ¶

  • Numpy, Scipy, and that's it!
  • Efficient execution (distributed implementation)
  • Online and offline learning rules
  • Complex model architectures enabled
  • Documentation: https://reservoirpy.readthedocs.io/en/latest/
  • GitHub: https://github.com/reservoirpy/reservoirpy

General info¶

  • Everything is NumPy (and more generally "standard" scientific Python)
  • First axis of arrays is always representing time.

Timeseries prediction ¶

The Lorenz attractor

$$ \begin{split} \dot{x}(t) &= \sigma (y(t) - x(t)) \\ \dot{y}(t) &= \rho x(t) - y(t) - x(t)z(t) \\ \dot{z}(t) &= x(t)y(t) - \beta z(t) \end{split} $$

  • describes convection movements in a fluid. Highly chaotic!
In [2]:
from reservoirpy.datasets import lorenz

timesteps = 3000
X = lorenz(timesteps, x0=[17.67, 12.93, 43.91])
In [4]:
plot_lorenz(X, 1000)
No description has been provided for this image

Knowing series value at timestep $t$:

  • How can we predict $t+1$, $t+100$...?
  • How can we predict $t+1$, $t+2$, $\dots$, $t+n$ ?

10-steps ahead forecasting¶

Predict $P(t + 10)$ knowing $P(t)$.

In [6]:
from reservoirpy.datasets import to_forecasting

x, y = to_forecasting(X, forecast=10)
X_train1, y_train1 = x[:2000], y[:2000]
X_test1, y_test1 = x[2000:], y[2000:]

plot_train_test(X_train1, y_train1, X_test1, y_test1)
No description has been provided for this image

Some reservoir reminder¶

No description has been provided for this image

$$ x[t+1]= (1 - \alpha) x[t] + \alpha f(W \cdot x[t] + W_{in} \cdot u[t] + W_{fb} \cdot y[t]) $$ $$ y[t]= W_{out}^{\intercal} x[t] $$

Echo State Networks with ReservoirPy¶

Models are built our of Nodes:

No description has been provided for this image
No description has been provided for this image

ESN preparation¶

In [8]:
units = 100               # - number of units
leak_rate = 0.3           # - leaking rate
spectral_radius = 0.95    # - spectral radius
input_scaling = 0.5       # - input scaling (also called input gain)
connectivity = 0.1        # - recurrent weights connectivity probability
input_connectivity = 0.2  # - input weights connectivity probability
regularization = 1e-4     # - L2 regularization coeficient
transient = 100           # - number of warmup steps
seed = 1234               # - use for reproducibility
In [10]:
from reservoirpy.nodes import Reservoir, Ridge

reservoir = Reservoir(units, input_scaling=input_scaling, sr=spectral_radius,
                      lr=leak_rate, rc_connectivity=connectivity,
                      input_connectivity=input_connectivity, seed=seed)

readout   = Ridge(ridge=regularization)
No description has been provided for this image
In [11]:
esn = reservoir >> readout
No description has been provided for this image
In [12]:
reservoir_fb = reservoir << readout

esn_fb = reservoir_fb >> readout

ESN training¶

Learning is performed offline: model is updated only once, on all available training data.

In [13]:
esn.fit(X_train1, y_train1, warmup=transient);
In [15]:
plot_readout(readout)
No description has been provided for this image

ESN forecast¶

In [16]:
y_pred1 = esn.run(X_test1)
In [17]:
plot_results(y_pred1, y_test1)
No description has been provided for this image

Coefficient de détermination $R^2$ et erreur quadratique normalisée :

In [18]:
rsquare(y_test1, y_pred1), nrmse(y_test1, y_pred1)
Out[18]:
(0.9966686152092658, 0.011448035696843583)

Closed-loop reservoir¶

  • Train the ESN on solving a 1-step ahead prediction (learn the flow function $f(x_t) = x_{t+1})$
  • Use ESN to predict on its own activity ("generative" mode).
No description has been provided for this image
In [19]:
units = 300               # - number of units
leak_rate = 0.3           # - leaking rate
spectral_radius = 1.25    # - spectral radius
input_scaling = 0.1       # - input scaling (also called input gain)
connectivity = 0.1        # - recurrent weights connectivity probability
input_connectivity = 0.2  # - input weights connectivity probability
regularization = 1e-4     # - L2 regularization coeficient
transient = 100           # - number of warmup steps
seed = 1234               # - use for reproducibility

Forecast of close future¶

In [21]:
esn = reset_esn()

x, y = to_forecasting(X, forecast=1)
X_train3, y_train3 = x[:2000], y[:2000]
X_test3, y_test3 = x[2000:], y[2000:]

esn = esn.fit(X_train3, y_train3, warmup=transient)

Closed-loop model¶

  • 100 timesteps used as warmup;
  • 300 timesteps created from reservoir dynamics, without external inputs.
In [22]:
seed_timesteps = 100

warming_inputs = X_test3[:seed_timesteps]

warming_out = esn.run(warming_inputs)  # échauffement
In [23]:
nb_generations = 500

X_gen = np.zeros((nb_generations, 3))
y = warming_out[-1]
for t in range(nb_generations):  # génération
    y = esn(y)
    X_gen[t, :] = y
In [24]:
X_t = X_test3[seed_timesteps: nb_generations+seed_timesteps]
plot_generation(X_gen, X_t, warming_out=warming_out, warming_inputs=warming_inputs)
No description has been provided for this image
In [25]:
plot_attractors(X_gen, X_t, warming_inputs, warming_out)
No description has been provided for this image

Online learning ¶

Online learning happens anytime and incrementally.

In the following, we will use Recursive Least Squares algorithm.

No description has been provided for this image
In [27]:
from reservoirpy.nodes import RLS

reservoir = Reservoir(units, input_scaling=input_scaling, sr=spectral_radius,
                      lr=leak_rate, rc_connectivity=connectivity,
                      input_connectivity=input_connectivity, seed=seed)

readout   = RLS()  # Recursive Least Squares


esn_online = reservoir >> readout

Step-by-step training¶

In [28]:
outputs_pre = np.zeros(X_train1.shape)
for t, (x, y) in enumerate(zip(X_train1, y_train1)): # for each timestep do :
    prediction = esn_online.train(np.atleast_2d(x), np.atleast_2d(y))
    outputs_pre[t, :] = prediction
In [29]:
plot_results(outputs_pre, y_train1, sample=100)
No description has been provided for this image
In [30]:
plot_results(outputs_pre, y_train1, sample=500)
No description has been provided for this image

Whole sequence training¶

In [32]:
esn_online.train(X_train1, y_train1)

pred_online = esn_online.run(X_test1)  # Wout is now learned and fixed
In [33]:
plot_results(pred_online, y_test1, sample=500)
No description has been provided for this image

Determination coefficient $R^2$ and NRMSE:

In [34]:
rsquare(y_test1, pred_online), nrmse(y_test1, pred_online)
Out[34]:
(0.9954163172865621, 0.013428448857951327)

Diving in the reservoir¶

In [35]:
units = 300               # - number of units
leak_rate = 0.3           # - leaking rate
spectral_radius = 1.25    # - spectral radius
input_scaling = 0.1       # - input scaling (also called input gain)
connectivity = 0.1        # - recurrent weights connectivity probability
input_connectivity = 0.2  # - input weights connectivity probability
regularization = 1e-4     # - L2 regularization coeficient
transient = 100           # - number of warmup steps
seed = 1234               # - use for reproducibility

1. The spectral radius¶

The spectral radius is the recurrent weights matrix ($W$) largest absolute eigenvalue.

In [36]:
states = []
radii = [0.1, 1.25, 10.0]
for sr in radii:
    reservoir = Reservoir(units, sr=sr, input_scaling=0.001, lr=leak_rate, rc_connectivity=connectivity,
                         input_connectivity=input_connectivity)

    s = reservoir.run(X_test1[:500])
    states.append(s)
In [37]:
units_nb = 20

plt.figure(figsize=(15, 8))
for i, s in enumerate(states):
    plt.subplot(len(radii)*100+10+i+1)
    plt.plot(s[:, :units_nb], alpha=0.6)
    plt.ylabel(f"$sr={radii[i]}$")
plt.xlabel(f"Activations ({units_nb} neurons)")
plt.show()
No description has been provided for this image
  • $-$ spectral radius $\rightarrow$ stable dynamics

  • $+$ spectral radius $\rightarrow$ chaotic dynamics

2. The input scaling¶

It is a coefficient applied to $W_{in}$. It can be seen as a gain applied on inputs.

In [38]:
states = []
scalings = [0.00001, 0.001, 2.0]
for iss in scalings:
    reservoir = Reservoir(units, sr=spectral_radius, input_scaling=iss, lr=leak_rate,
                          rc_connectivity=connectivity, input_connectivity=input_connectivity)

    s = reservoir.run(X_test1[:500])
    states.append(s)
In [39]:
units_nb = 20

plt.figure(figsize=(15, 8))
for i, s in enumerate(states):
    plt.subplot(len(scalings)*100+10+i+1)
    plt.plot(s[:, :units_nb], alpha=0.6)
    plt.ylabel(f"$iss={scalings[i]}$")
plt.xlabel(f"Activations ({units_nb} neurons)")
plt.show()
No description has been provided for this image

Average correlation of reservoir states and inputs :

  • $+$ input scaling $\rightarrow$ activity is bounded to input dynamics
  • $-$ input scaling $\rightarrow$ activity is freely evolving

Input scaling may be used to adjust relative importance of different inputs.

3. The leaking rate¶

$$ x(t+1) = \underbrace{\color{red}{(1 - \alpha)} x(t)}_{\text{current state}} + \underbrace{\color{red}\alpha f(u(t+1), x(t))}_{\text{new inputs}} $$

with $\alpha \in [0, 1]$ and:

$$ f(u, x) = \tanh(W_{in} \cdotp u + W \cdotp x) $$

In [40]:
states = []
rates = [0.02, 0.2, 0.9]
for lr in rates:
    reservoir = Reservoir(units, sr=spectral_radius, input_scaling=input_scaling, lr=lr,
                          rc_connectivity=connectivity, input_connectivity=input_connectivity)

    s = reservoir.run(X_test1[:500])
    states.append(s)
In [41]:
units_nb = 20

plt.figure(figsize=(15, 8))
for i, s in enumerate(states):
    plt.subplot(len(rates)*100+10+i+1)
    plt.plot(s[:, :units_nb] + 2*i)
    plt.ylabel(f"$lr={rates[i]}$")
plt.xlabel(f"States ({units_nb} neurons)")
plt.show()
No description has been provided for this image
  • $+$ leaking rate $\rightarrow$ low inertia, short activity timescale
  • $-$ leaking rate $\rightarrow$ strong inertia, strong activity timescale

The leaking rate is a proxy of the inverse of the reservoir neurons time constant.

Use case: falling robot ¶

No description has been provided for this image
In [43]:
features = ['com_x', 'com_y', 'com_z', 'trunk_pitch', 'trunk_roll', 'left_x', 'left_y',
            'right_x', 'right_y', 'left_ankle_pitch', 'left_ankle_roll', 'left_hip_pitch',
            'left_hip_roll', 'left_hip_yaw', 'left_knee', 'right_ankle_pitch',
            'right_ankle_roll', 'right_hip_pitch', 'right_hip_roll',
            'right_hip_yaw', 'right_knee']

prediction = ['fallen']
force = ['force_orientation', 'force_magnitude']
In [48]:
plot_robot(Y, Y_target, F)
No description has been provided for this image

ESN training¶

Using ESN class, an optimized and distributed implementation of Echo State Network.

In [50]:
from reservoirpy.nodes import ESN

reservoir = Reservoir(300, lr=0.5, sr=0.99, input_bias=False)
readout   = Ridge(ridge=1e-3)
esn = ESN(reservoir=reservoir, readout=readout, workers=-1)  # parallel computations: on
In [51]:
esn = esn.fit(X_train, y_train)
In [52]:
res = esn.run(X_test)
In [55]:
plot_robot_results(y_test, res)
No description has been provided for this image
In [56]:
print("Mean RMSE:", f"{np.mean(scores):.4f}", "±", f"{np.std(scores):.5f}")
print("Mean RMSE (with threshold):", f"{np.mean(filt_scores):.4f}", "±", f"{np.std(filt_scores):.5f}")
Mean RMSE: 0.1571 ± 0.09481
Mean RMSE (with threshold): 0.1339 ± 0.14254
In [57]:
acc = 0.0
for y_pred, y_true in zip(res, y_test):
    true_fall = 1 if np.max(y_true) > 0.8 else 0
    pred_fall = 1 if np.max(y_pred) > 0.8 else 0
    acc += true_fall == pred_fall
print("Accuracy: ", acc / len(y_test))
Accuracy:  0.9806094182825484

Use case: anytime decoding of canary songs¶

Dataset can be found on Zenodo: https://zenodo.org/record/4736597

No description has been provided for this image

Decoded song units: phrases, which are repetitions of identical syllables.

  • One label per phrase/syllable type, with phrase onset and offset time.
  • One SIL label used to denote silence.
In [58]:
im = plt.imread("./static/canary_outputs.png")
plt.figure(figsize=(15, 15)); plt.imshow(im); plt.axis('off'); plt.show()
No description has been provided for this image

ESN training¶

In [63]:
esn = esn.fit(X_train, y_train)
In [64]:
outputs = esn.run(X_test)
In [66]:
scores  # for each song in the training set
Out[66]:
[0.9494949494949495,
 0.909044193216855,
 0.9531759739013625,
 0.9407697626181147,
 0.9638157894736842,
 0.9478260869565217,
 0.892835458409229,
 0.9403111739745403,
 0.9212787899621864,
 0.8762235554152195]
In [67]:
print("Précision moyenne :", f"{np.mean(scores):.4f}", "±", f"{np.std(scores):.5f}")
Précision moyenne : 0.9295 ± 0.02718

Thank you! It is now yours to try¶

Nathan Trouvain
Inria, IMN, LaBRI - Bordeaux, France




UCLA - November 14th 2023